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Abstract 

The focus of this paper is on spatial precoding in correlated multi-antenna channels, where the number of 
independent data-streams is adapted to trade-off the data-rate with the transmitter complexity. Towards 
the goal of a low-complexity implementation, a structured precoder is proposed, where the precoder 
matrix evolves fairly slowly at a rate comparable with the statistical evolution of the channel. Here, the 
eigenvectors of the precoder matrix correspond to the dominant eigenvectors of the transmit covariance 
matrix, whereas the power allocation across the modes is fixed, known at both the ends, and is of 
low-complexity. A particular case of the proposed scheme (semiunitary precoding), where the spatial 
modes are excited with equal power, is shown to be near-optimal in matched channels. A matched 
channel is one where the dominant eigenvalues of the transmit covariance matrix are well-conditioned 
and their number equals the number of independent data-streams, and the receive covariance matrix is 
also well-conditioned. In mismatched channels, where the above conditions are not met, it is shown that 
the loss in performance with semiunitary precoding when compared with a perfect channel information 
benchmark is substantial. This loss needs to be mitigated via limited feedback techniques that provide 
partial channel information to the transmitter. More importantly, we develop matching metrics that 
capture the degree of matching of a channel to the precoder structure continuously, and allow ordering 
two matrix channels in terms of their mutual information or error probability performance. 
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I. Introduction 

Multiple antenna communications has received significant attention over the last decade as 
a mechanism to increase the rate of information transfer, or the reliability of signal reception, 
or a combination of the two. The focus of this work is on point-to-point spatial precoding 
systems, where the number of independent data-streams is constrained to be a subsej^, M, 
of the transmit dimension so as to minimize the complexity and the cost associated with 
transmission. Initial works on precoding study optimal signaling strategies when perfect channel 
state information (CSI) is available at the transmitter and the receiver. These studies show that 
a channel diagonalizing input that corresponds to exciting the dominant M-dimensional eigen- 
space of the channel, with a power allocation that can be computed via waterfilling, is robust 
under different design metrics [l]-[9]. 

Although perfect CSI provides a benchmark on the performance, it is difficult to obtain in 
practice. More importantly, the system performance is not robust under CSI uncertainty. Even a 
small error in the CSI at the transmitter can lead to a dramatic degradation in performance with 
a scheme that is designed for the mismatched CSI [10]— [14]. Furthermore, even if perfect CSI 
is available, tight constraints on complexity as well as energy consumption [15]— [19] at the RF 
level in the mobile ends may disallow the implementation of optimal solutions in practice. This 
is because Third Generation wireless systems and beyond are expected to be multi-carrier in 
nature and the burden of computing the optimal input is magnified by the number of sub-carriers 
and the rate of evolution of the channel realizations. Besides this, the structure of the input could 
change, often dramatically, at the rate of evolution of the channel realizations, which also makes 
it difficult to implement. These reasons suggest that a slower rate of adaptation of the input 
signals, that is of low complexity and is more robust to CSI uncertainty, is preferred in practice. 

In realistic wireless systems, where the channels are spatio-temporally correlated, the slow 
rate of statistical evolution implies that it is reasonable to assume perfect statistical knowledge 
of the channel at the transmitter. Since the spatial statistics experienced by the individual sub- 
carriers are identical [20]-[22], the burden of computing the optimal input with only the statistical 
information at the transmitter is equivalent to that of a narrowband system. Even in this setting, 
optimal precoding has been studied for different spatial correlation models [10], [11], [21], 
[23]-[32]. These works show that the eigen-directions of the optimal input co variance matrix 
correspond to a set of the M-dominant eigenvectors of the transmit covariance matrix and are 
hence, easily adaptable to changes in statistics. However, computing the power allocation across 

'The number of data-streams, M, is such that 1 < M < N t with N t denoting the transmit antenna dimension. Note that M 
is the rank of the input covariance matrix and the number of radio-frequency (RF) link chains as well. 
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the M modes requires Monte Carlo averaging or gradient descent-type approaches [10], [11], 
[21], [28], [29]. While the computational complexity of the power allocation algorithm may be 
affordable at the base station end, whether it is possible or not at the mobile end is questionable. 
Moreover, there has been no systematic study of statistics-based precoding approaches and hence, 
it is not clear as to how far the performance of the statistical scheme is with respect to the perfect 
CSI benchmark. 

It should be noted that all the above works study precoder design with an emphasis on obtaining 
information-theoretic limits on performance. In contrast, our focus here is on low-complexity 
schemes that can be easily implemented and easily adapted to changes in channel statistics. In 
this work, we consider a narrowband setup where spatial correlation is modeled by a general 
decomposition [28], [33], [34] that: 1) Is based on physical principles, 2) Has been verified by 
many recent measurement campaigns, and 3) Includes as special cases the well-studied i.i.dc 
model, the separable correlation model [35], and the virtual representation [20], [21], [36]. 

We propose the notion of structured precoding, where the power allocation across the M spatial 
modes is fixed and known at both the ends. Two specific cases are studied in depth in this work: 
1) A statistical semiunitar^ precoder, where the eigen-directions of the input correspond to the 
dominant eigenvectors of the transmit covariance matrix and the power allocation is uniform, is 
studied theoretically. 2) A precoder, where the eigen-directions are as before, and the power is 
allocated proportionate to the transmit covariance matrix eigenvalues below a threshold signal- 
to-noise ratio (SNR) and uniformly above this SNR, is studied via simulations. Following the 
philosophy propounded here, more complicated schemes, where the power allocation across the 
modes can be computed with low-complexity, possibly as a function of the SNR and the statistics, 
can also be considered. 

Our focus is on two questions: 1) When is the first scheme near-optimal with respect to a 



perfect CSI benchmark?, and 2) What is the "gap' 



in performance and how does it depend on the 



system and the channel parameters? The performance metric used in this work is relative average 
mutual information loss. We also study relative uncoded error probability enhancement and 
relative mean- squared error (MSE) enhancement, whenever they can be characterized analytically. 

The answers to the above questions lie in the notion of matched and mismatched channels, 
which are introduced in this work. A matched channel is one where the channel is effectively 
matched to the precoding scheme with the following two conditioning properties being true: 1) 

2 I.I.D. stands for independent and identically distributed. 

3 An Nt x M matrix X with M < N t is said to be semiunitary if it satisfies X ff X = 1m- 

4 This gap can possibly be bridged with a limited feedback scheme [12]— [14], [37] that provides partial channel information 
to the transmitter. 
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The M-dominant eigenvalues of the transmit covariance matrix are well-conditioned , whereas 
the remaining (N t — M) eigenvalues are ill-conditioned away from the dominant ones, and 2) 
The receive covariance matrix is also well-conditioned. A mismatched channel is one where both 
the transmit and the receive covariance matrices are ill-conditioned, with the additional condition 
that rank(H) > M with probability 1. 

We show that matched and mismatched channels correspond to the cases where the relative 
performance of the semiunitary precoder are closest and farthest to the perfect CSI precoder, 
respectively. The degree of channel-to-precoder scheme matching can be abstractly measured 
with matching metrics, that are also introduced in this work. As a by-product of our study, 
we also show that the semiunitary precoder is near-optimal in the relative antenna asymptotic 
setting for any channel. This paper generalizes previous work [14] on the beamforming case 
(M = 1), where we studied the performance of the statistical beamforming scheme. 
Organization: After elucidating the system model in Section [III we benchmark the structure 
of the optimal structured precoder in the perfect CSI case in Section [nil Using tools from 
majorization theory, we show that the optimal input naturally extends the channel-diagonalizing 
input from the unconstrained case [l]-[9]. In Section [IV] we elaborate on the problem setup of 
structured precoding. In Sections [VtiVIH using tools from random matrix theory and eigenvector 
perturbation theory, we study the asymptotic (in antenna dimensions) performance of a statistical 
semiunitary precoder that excites the M -dominant eigenvectors of the transmit covariance matrix. 
We provide numerical studies to illustrate the benefits of the proposed precoding scheme under 
realistic system assumptions in Section IVIIII with a discussion of our results and conclusions in 
Section [DD Proofs of most of the claims have been relegated to the appendices. 
Notation: The M-dimensional identity matrix is denoted by I M . The i,j-th and z-th diagonal 
entries of a matrix X are denoted by X(z,j) and X(i), respectively. In more complicated 
settings (for example, when the matrix X is represented as a product or sum of many matrices), 
the above entries are denoted by Xy and X i; respectively. The complex conjugate, conjugate 
transpose, regular transpose and inverse operations are denoted by (•)*, (-) H , (-) T and while 
the expectation, the trace and the determinant operators are given by E[-], Tr(-) and det(-), 
respectively. The t-dimensional complex vector space is denoted by C*. The standard big-Oh 
(O) and small-oh (o) notations are used along with the standard ordering for eigenvalues of an 
n x n-dimensional Hermitian matrix X: A X (X) > ••• > A„(X). The largest and the smallest 
eigenvalues are often denoted also by A max (X) and A m i n (X), respectively. The notation x + stands 

5 If A t (l) > ••• > At(M) denote the first M eigenvalues of the transmit covariance matrix and xf(M] is ( or * s not ) 
significantly larger than 1, we loosely say that these eigenvalues are ill-(or well-)conditioned. 
6 That is, when jf- -> or oo as {M, N t ,N r } -» oo. 
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for max(x, 0). 

II. System Setup 

We consider a communication model with N t transmit and N r receive antennas, where M 
(1 < M < Nt) independent data-streams are used in signaling. That is, the M-dimensional input 
vector s is precoded into an ^-dimensional vector via the N t x M precoding matrix F and 
transmitted over the channel. The discrete-time baseband signal model used is 

y = HFs + n, (1) 

where y is the iVV -dimensional received vector, H is the N r x ^-dimensional channel matrix, and 
n is the iV r -diniensional (zero mean, unit variance) additive white Gaussian noise. In practice, 
the choice of M is decided based on a trade-off between complexity, cost and performance gain. 

A. Channel Model 

The main emphasis of this work is on the impact of spatial correlation. We isolate the spatial 
aspect by assuming a block fading, narrowband model for the time-frequency correlation of H. 
It is well-known that Rayleigh fading (zero mean complex Gaussian) is an accurate model for 
H in a non line-of-sight setting and hence, the complete spatial statistics are described by the 
second-order moments of {H(i,j)}. 

The most general, mathematically tractable spatial correlation model is a canonical decom- 
position^ of the channel along the transmit and the receive covariance bases [28], [33], [34]. In 
this model, we assume that the auto- and the cross-covariance matrices of all rows of H have 
the same eigen-basis (denoted by U 4 ), and the auto- and the cross-covariance matrices of all the 
columns of H have the same eigen-basis (denoted by U r ). Thus, we can decompose H as 

H = U r H ind Uf, (2) 

where H ind has independent, but not necessarily identically distributed entries, and XJ t and U r 
are unitary matrices. The transmit and the receive covariance matrices are defined as 

= £[H fl H] = U t £[H*H ind ]Uf = U t A t Uf, (3) 
S r = £[HH ff ] = U r £[H ind H^ d ]Uf = U r A r Uf, (4) 

where A t = £[H? d H ind ] and A r = £[H ind H^ d ] are diagonal. 

7 This model is referred to as the "eigen-beam or beamspace model" in [33] and is used in capacity analysis in [28]. 
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Under certain special cases, the model in © reduces to some well-known spatial correlation 
models such as the i.i.d. model, the separable correlation [35] and the virtual representation [20], 
[21], [36] frameworks. The readers are referred to [13] for details. The i.i.d. model, while 
being analytically tractable, is unrealistic for applications where large antenna spacings or a rich 
scattering environment are not possible. Even though the separable model may be an accurate fit 
under certain channel conditions [38], deficiencies acquired by the separability property result 
in misleading estimates of system performance [34], [39], [40]. The readers are referred to [33], 
[39], [41] for more details on how the canonical, and more specifically the virtual model fit 
measured data better. Given a correlated channel, in this work, we will assume without any loss 
in generality that M < rank(A t ) < N t . 

B. Channel State Information 

Initial works in the precoding literature have assumed perfect CSI at both the transmitter and 
the receiver. Perfect CSI at the receiver (the coherent case) is usually reasonable for systems 
that adopt a 'training followed by signaling' model. On the other hand, both the perfect and the 
no CSI assumptions at the transmitter are unrealistic, being too optimistic and too pessimistic, 
respectively. This is so because the perfect CSI condition imposes a huge burden on the training 
or the feedback apparatus on the reverse link while on the other hand, the spatial statistics of 
the channel entries evolve over much slower timescales and can be learned at both the ends. In 
this work, we study the coherent case with perfect statistical knowledge at the transmitter. 

C. Transceiver Architecture 

The transmitted vector Fs (see CD) has a power constraint p. The transmit power constraint 
can be rewritten as 

p = E[s H ¥ H Fs\ = Tr (E [F s F^] ) = Tr (F Q s F h ) , Q s 4 E [ss H ] . (5) 

By decomposing F and Q s using singular value decomposition (SVD) and renormalizing, it can 
be seen that the system equation can be written as: 

y = HFs + n, F = ^V F A F /2 , (6) 

where V F is an N t x M semiunitary matrix, A F is an M x M non-negative definite power 
shaping (allocation) matrix with Tr(A F ) < M, and s is an M x 1 vector with i.i.d. components 
that have zero mean and variance one. That is, the general precoder can be thought of as a power 
loading by A F , followed by a rotation with V F . 
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The optimal reception strategy of the input symbols corresponds to non-linear maximum 
likelihood (ML) decoding. However, the exponential complexity of ML decoding in both antenna 
dimensions and coherence length implies that simpler receiver architectures are preferred. In this 
work, we assume a linear minimum mean-squared error (MMSE) receiver. With this receiver, 
the symbol corresponding to the A;-th data-stream is recovered by projecting the received signal 
y on to the N r x 1 vector 



P ( P 



HFF H 



fluff 



I 



A, 



Hf fc 



M \M 

where f fc is the k-th column of F. That is, the recovered symbol is s(k) 
signal-to-interference-noise ratio (SINR) at the output of the linear filter is 

1 



(7) 



gf y, and the 



SINR, 



(I M + ^F"H"HF) 



-i 



- 1. 



(8) 



Also, note that the MSE of the k-th data-stream, MSE fc , is given by (l M + ^F H H ff HF) 1 



D. A Case for Structured Precoding 

Almost all of the current works on precoder design do not assume any specific structure on 
the precoder matrix F. This is because the main focus of these works is on characterizing the 
fundamental performance limits of precoding. That is, to study optimal signaling schemes from 
a mutual information or an error probability viewpoint. 

The structure^ of the optimal precoder, F opt , critically depends on the knowledge of the 
eigenspace of H (see Sec. [Ell]). Even a small inaccuracy in the knowledge of the eigenspace of 
H could lead to a precoder with a significantly degraded performance [10]-[14]. While this issue 
does not arise in the perfect CSI case, it is critical in systems with imperfect CSI. In particular, 
imperfect channel knowledge arises in practice due to constraints on the quality and frequency 
of channel or statistical feedback and channel estimation at the receiver. 

Moreover, even if perfect CSI is available at the transmitter, the efficient utilization of this in- 
formation is constrained by fundamental limits on energy per bit constraints at the computational 
or processing level [15]— [19]. These limits in turn imply that a large number of computations are 
difficult to realize in low -power devices, such as those found at the mobile ends. For example, 
the move towards multi-carrier signaling and the fast rate at which channel realizations evolve 
leads to computational limits on how many SVD operations can be afforded. Another key aspect 
to note is that the eigenspace of the optimal input could change dramatically from one channel 

8 By structure, we mean a set of eigenvectors and eigenvalues of F opt , that are captured by Vf op , and Af op ,, in ©. 
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realization to the next, and this poses constraints on the adaptivity of the solutions proposed in 
the literature. In fact, RF design constraints imposed by the above limits are often the principal 
stumbling blocks in realizing multi-antenna systems in practice. The readers are referred to [18] 
for a broad array of RF design challenges, imposed by computational and complexity constraints. 

All of the above reasons suggest that it may not be possible for F to be designed at an 
arbitrarily fast rate. They also suggest that F cannot have arbitrary structure and one cannot 
learn it with arbitrarily fine precision. The case of statistical precoding, where the optimal input 
is adapted in response to the statistical information has thus received significant attention. In this 
case, computing the optimal power allocation across the excited modes requires either Monte 
Carlo averaging or gradient descent-type approaches (see Sec. HVl) . The affordability of the 
complexity of these approaches at the mobile end is again questionable. 

These reasons motivate us to study structured precoding, where the eigen-modes as well as 
the power allocation across them are determined via low-complexity operations on the channel 
statistics. The additional structure imposed on F serves the following purposes: 1) Isolating 
the impact of inaccuracy in the singular vectors and singular values of F on performance with 
respect to a genie-aided design, 2) Given that there are resource constraints on the reverse link 
quantization, identifying those features of the channel H that require an appropriate resource 
allocation so as to optimize system performance, and 3) Obtaining more realistic 'intermediate' 
benchmarks for systems in practice. 

We first focus on a specific class of semiunitary precoder, where A F = \ M . We then consider 
the more general structured precoder case, where Af is fixed, but is chosen different from the 
identity matrix. 

III. Perfect CSI Benchmark for Structured Precoding 

Towards the eventual goal of studying a structured statistical precoding scheme, we first 
characterize the optimal perfect CSI benchmark in this section. 

A. Unconstrained Precoders 

If only one data-stream is excited (M = 1), the received SNR is given by p — ^ , where 
f is the beamforming vector and z is the combining vector. It is straightforward to note that 
the jointly optimal design of z and f can be reduced to a beamformer design by using the 
combining vector -^==^, and that the optimal choices f opt and z opt are the dominant right 

Hf 

singular vector of H and -, opt respectively [42]. In this case, the received SNR coincides 

y ^max(H fl H) 

with p A max (H H H). 
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In contrast to beamforming, the precoding case with M > 1 requires a recourse to the study of 
eigenvalues of products of Hermitian matrices. For the (general) unconstrained precoding case, 
the joint precoder-equalizer design turns out to have a channel diagonalizing structure. To state 
this result, we need some additional notation. Let an SVD of H be given by H = U H A H V£[, 
where V H = [vi • • • VjyJ. Without any loss in generality, we assume that the non-trivial singular 
values of H are arranged in the standard order. 

Lemma 1: The optimal choice of Vp opt and Ap opt in © are as follows: Vp opt corresponds to 
[vx ■ • • vm], and the diagonal entries of A Fopt are obtained via waterfilling. 

Proof: The optimality of the channel diagonalizing structure has been proved in [l]-[4], 
with the design metric being the average MSE of the data-streams. Other design metrics where 
the channel diagonalizing structure is optimal include weighted MSE of the data-streams [5], [6], 
determinant of the MSE matrix [7], and a peak-power constraint metric [8]. A unified convex 
programming framework for precoder optimization is proposed in [9] by studying two broad 
classes of functions: Schur-concave§ and Schur-convex functions. In [9], the authors show that 
most of the above design criteria can be formulated as either a Schur-concave or Schur-convex 
function of the MSE and the channel diagonalizing structure is optimal in either case. ■ 



B. Semiunitary Precoders 

When the precoders are constrained to be structured, it is intuitive (but not obvious) to expect 
a channel diagonalizing structure to be optimal. The following series of propositions elucidate 
the optimality of this structure in the semiunitary case with certain restrictions on the objective 
function. The more general structured case will be considered thereafter. The readers are referred 
to App. [A] for many relevant definitions and results from majorization theory. Following the 
introduction from App. [Al we are prepared for the following. 

1 ) Precoders that Optimize Schur-concave Objective Functions: 

Proposition 1: Let / : IR M t->Rbea Schur-concave function over its domain. Also, let /(•) 
be monotonically increasing in its arguments. That is, let the univariate function /(• • • , x^, ■ ■ • ) : 
1 1-> R be monotonically increasing for all k. If MSE = [MSEi • • • MSE A /], then the optimal 
choice of semiunitary precoder F opt that minimizes /(MSE) is given by 

F opt = [vi ••• v M ]. (9) 
Proof: See Appendix El ■ 
The utility of the above proposition can be gauged from the fact that a large class of useful 
functions satisfy the Schur-concavity property. For example, from Remark |2] in App. [A] we see 

9 The definitions of Schur-concave and Schur-convex functions are provided in Appendix [A] 
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that any weighted arithmetic or geometric mean of {MSE fe } (with weights chosen appropriately) 
is Schur-concave. The same remark illustrates the limitations of this partitioning because the 
mutual information function cannot (in general) be expressed as a Schur-concave (or a Schur- 
convex) function of MSE. 

In the special case of Gaussian inputs, the objective function /(•) to be maximized is 

/(•) = logdet (l M + -^F h H h Hf) = -logdet (E) , (10) 

where E is the mean-squared error matrix defined as 

£[(s-s)(s-s)"]4 (Im+AfVHf)" 1 . (11) 

It can be shown that maximizing the mutual information with the Gaussian input (or alter- 
nately, minimizing the determinant of E) can be easily accommodated in the framework of 
Prop. CD see [9] for details. Alternately, an easy consequence of Lemma \W\ (see App. |A]) is 
the fact that a channel diagonalizing structure maximizes mutual information and this has been 
established in [43]. Also note that if M = N t , any choice of F unitary leads to the same value 
of /(•). Extending the proof of [43] to the case of a non-Gaussian input requires closed-form 
expressions for the mutual information, which are (in general) difficult to obtain. 

2 ) Precoders that Minimize the Average Error Probability: Besides mutual information, un- 
coded error probability is another important metric that describes the performance of a commu- 
nication system. We now show how the machinery of majorization theory can be used to study 
the error probability. We state the most general form of this study in the following proposition, 
with its particularization to the error probability case illustrated thereafter. 

Proposition 2: Let h : R i— > R be a continuous, increasing, and convex function of its 
argument. The optimal choice of F that minimizes ^2 k=1 h(MSEk) is given by 

F opt = [ Vl • • • v M ] T, (12) 

where T is an appropriately chosen unitary matrix (see App. |B] for details on construction). 
Proof: See Appendix El ■ 
If h(-) is as in Prop. [2l and g : R M i— > R is defined as 

M 

2(MSE)^]T/i(MSE fc ), (13) 

fc=l 

then it is important to note from Lemma [7] in App. [A] that g(-) is a Schur-convex function of 
MSE. Thus, in general, Prop. [2] is neither a consequence of nor implies Prop. [Q 
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We now show how Prop. [2] is useful in the error probability setting. Let P err denote the 
probability that at least one of the M data-streams is in error. Then, 

M 

P err = l-l[(l-P k ), (14) 

k=l 

where Pf. is the probability that the A;-th data-stream is in error. If some fixed constellation is 
used for signaling across all the data-streams, we can write Pk as 

P fc = aQ(/3(SINR fc ) 1/2 ), (15) 

where SINRfc is the received SINR of the A;-th data-stream after linear processing [44], a and 
(3 are constants dependent only on the type of the constellation, and Q(-) is the Q-function 
associated with a standard Gaussian random variable. Assuming that the error probability of the 
weakest data-stream is sufficiently small (which is reasonable for most design problems), we 
have P err ~ J2h=iPk- Alternately, one could consider a metric that measures the average error 
probability of the individual data-streams: -g ^feli^fe- Thus, in either case, we are interested 
in studying the optimal choice of precoder F that minimizes ^2k=i 

It is straightforward to note that Pk(-) is a continuous and increasing function of MSE. Besides, 
it is shown in [9] that Pk(-) is a convex function^ of MSE as long as the argument is sufficiently 
small. We are thus justified in assuming that Pk(-) is convex, continuous and increasing in MSE. 
Then, Prop. [2] shows that P err is minimized by F opt as in (fl"2~l) . 

3) Precoders that Optimize Schur-convex Objective Functions: It is natural to probe the 
optimality of F opt in (fT2l) if instead of the average error probability, we considered the error 
probability corresponding to the weakest data-stream. For this, we now need the counterpart of 
Prop. Q] which is as follows. 

Proposition 3: Let / : R M i— > R be a Schur-convex function over its domain. Also, let /(■) 
be monotonically increasing in its arguments. The optimal choice of semiunitary precoder F opt 
that minimizes /(MSE) is given by 

F opt = [vi ■•• v M ]r, (16) 

where T is the same unitary matrix as defined in Prop. [21 

Proof: The proof follows along the same lines as Prop. [21 No details are provided. ■ 

10 In particular, it is shown in [9, App. H] that if the corresponding bit error rate values satisfy BER < 0.02, this is true 
independent of the input constellation. Moreover, in the case of BPSK and QPSK constellations, Pk(-) is convex over the entire 
domain of MSE. Note that, as stated in [9], the assumption of BER < 0.02 is mild in a practical scenario since the uncoded 
BER is usually much smaller than 0.02. 
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To answer the question that led towards the above proposition, note from Lemma [8] in App. |A] 
that max fc P k is a Schur-convex function of MSE. Thus from Prop. [3l the optimal precoder is 
as in (fT6l) . Further, note that the matrix V in the description of F opt in (fl"2l) and (fT6l ) can be 
ignored since s is i.i.d. and therefore, so is Ts. 

C. General Structured Precoders 

We now generalize our results to the general structured case. 

Proposition 4: Let the structure of the precoder be F = VpAf/g d , where A fixed is some 
fixed matrix of rank M with Tr(A fixed ) < M, albeit chosen arbitrarily. That is, in the ensuing 
optimization Afj xed is fixed and we only optimize over Vp. As before, the structure of the optimal 
Vf depends on the nature of the objective function. 

• Schur-concave objective functions (and in particular, the mutual information with Gaussian 
input) are optimized by F of the form: 

F opt = [vi • • • v M ] A^ d . (17) 

• Schur-convex objective functions (and in particular, the average uncoded error probability) 
are optimized by F of the form: 

F opt = [n ■ • ■ v M ] A f l 2 d r (18) 
for an appropriately chosen unitary matrix T. 

Proof: We follow the same proof techniques of Prop. [Dl3] See Appendix |B] for details. ■ 
Thus, even in the more general structured precoding case, the channel diagonalizing structure is 
optimal. 

IV. Statistical Precoding: Preliminaries 

We now assume that instantaneous channel information is not available at the transmitter, but 
channel statistics are known. 

A. Notations 

While much of the notations required in the rest of the paper have been established in Sec. III-AL 
we find it convenient to restate some of them that are often used in the ensuing sections. We 
assume that H is described by either the separable model or the more general non-separable 
model of ©• Let the variance of H ind (i, j) be denoted by cr?-. The eigenvalues of the transmit 
covariance matrix are denoted by {A t (k)} in the separable case while in the non-separable case, 
they are denoted by ^ t ,k — J2f=i a ik- ^ n either case, we assume that the columns of H ind are 
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arranged such that the transmit eigenvalues are in decreasing order. The channel power of H, 
p c , is given by p c = Y^iti A r («) = Y^i=i At(«)- The normalized channel power is j r = jjf-. 

In the separable case, let A t denote the principal M x M sub-matrix of A t and Hiid denote 
the N r x M principal sub-matrix of Hiid. That is, 



H 



iid 



Had, 

N r xM N r x(Nt-M) 



(19) 



Without any explicit reference to k, we will often denote by A t , the (M — 1) x (M — 1) matrix 
obtained from A t by removing the £>th row and k-th column and by H iid , the matrix obtained 
from H iid by removing the A;-th column alone. In the non-separable case, let H ind denote the 
N r x M-dimensional principal sub-matrix of Hi n d. 

B. Unconstrained Precoders 

1/2 

Lemma 2: The optimal precoder F statopt is of the form V stat A stat , where V stat is a set of M 
dominant eigenvectors of the transmit covariance matrix T, t and A stat is the unique solution to 
the following constrained optimization: 



A s tat = arg max E u 



logdet (i^ + ^HindAH^ 



(20) 



with C = {A} denoting the convex set of all diagonal M x M non-negative definite matrices 
such that Tr(A) < M. ■ 
The optimality of the dominant eigenvectors of S t is not surprising (see [10], [11], [21], [23]- 
[26], [28] and references therein for problems of a similar nature). The optimization in (l20l) is 
standard: Maximizing a concave function over a convex set. A gradient descent-type approach 
for this is provided in [30] and a Monte Carlo approach is provided in [21], [28], [29]. 

C. Structured Statistical Precoders 

As explained in Sec. III-Dl the complexity of solving for A stat in (|2Q|) may be unaffordable in 
many practical scenarios. We therefore pursue two statistics-based precoders: F semi and F fixe d, 

1/2 

with F semi = V stat and Ff ixed = V stat A fi ^ ed . The choice of A fixe d that is of interest here is: 

f M-^pfj- if p<SNR T , 

[ 1 if p > SNR-r. 

The threshold SNR (SNR T ) is such that 

M 

SNR T = a , - (22) 
MM) 
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for an appropriate choice of a, a > 1. This choice is motivated by our recent work [45] on 
transient-SNR (the SNR at which exciting M modes is information theoretically optimal) design. 

For a given channel realization, let I s tat,semi(p) and -P e rr,stat,semi(p) denote the mutual informa- 
tion and error probability achievable with F semi , while J stat> fixed (p) and P e „ t stat) fixed (p) denote the 
corresponding quantities with F fixed , all at an SNR of p. Similarly, denote the corresponding quan- 
tities with the three perfect CSI precoders described in Lemma [Q © and (fTTT) by: / pe rf.unconst(p) 5 

-^perf , sem \{p), Iperf, fixed (p), and P e rr, perf , unconst (p), Perr , perf, semi 

(p), P err , perf, fixed (p), respectively. It 
is important to note the distinction between these quantities. While istat,»(p) and P e rr,stat,»(p) are 
functions of the channel realization H, the precoder structure itself is independent of F£, but only 
dependent on the channel statistics. On the other hand, I pe rf..(p) and P etT; perf; .(p) in addition 
to being dependent on the channel realization also correspond to precoders whose structure is 
dependent on H and chosen optimally. 

D. Average Relative Difference Metrics 

Towards the goal of studying the proposed scheme(s), we develop universal metrics that 
capture the performance gap between the proposed precoder(s) and an ideal benchmark. We first 
motivate the choice of our metric in an abstract context. 

Let 'scheme 1' and 'scheme 2' denote two signaling schemes with I sc heme,i(p) and I sc heme, 2 (pi 
denoting the mutual information of the two schemes at an SNR, p. Our goal is to quantify!!! 
whether scheme 1 is better than scheme 2 or not, and if so, by how much. For any signaling 
scheme, the average mutual information is a function of p as well as the statistical description 
of the channel. Irrespective of the spatial correlation, the average mutual information of any 
scheme tends to zero as p — > and tends to infinity as p — > 00. For this reason, the difference 
in average mutual information between the two schemes can converge to zero as p — ► at a 
rate different from that of either scheme, and could blow up to infinity as p — > 00. Thus, the 
difference in average mutual information is not a good measure for comparing the two schemes. 

An efficient comparison of the two schemes is possible by using either of the following set 
of average relative difference metrics: 

at A jjg [-f scheme, 1 (p) ~ -f scheme, 2 (p)] m ^ 

^-'scheme 1, scheme2 r? \ t f M ' v"^) 

[1 scheme, 2 \P ) \ 



AT — F 

'— iJ scheme 1, scheme 2 J j H 



-^scheme, 1 (P) -^scheme, 2 (P) 



(24) 



scheme, 2 (p) 

Note that the choice of scheme 2 in the denominator of (|23l) and (T24l) is the scheme that 
performs relatively poorly. Thus, AI, and A/, correspond to a worst-case measure of relative 



"in our setting, 'scheme 1' corresponds to a perfect CSI precoder and 'scheme 2' to a structured statistical precoder. 
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performance. The metrics are more meaningful (than the difference metric) in studying the 
relative gap (or closeness) between the schemes^, independent of the SNR. While we have used 
the case of average mutual information to motivate the need for a relative difference metric, the 
same argument is applicable in the error probability case. In fact, the need for such a metric is 
more critical in the error probability case since the error probabilities of the schemes that are 
being compared (and hence, the difference between them) are small. 



E. Problem Setup 

The main goal of this paper is to quantify, as a function of the statistics and antenna dimensions, 

at _A -^H [-^perf, unconst (p) ~ -^stat, semi {Pj\ 

^-'semi ^ FT 7 yi VAJ) 

[-/stat, semi \P) \ 



in the case of mutual information, and 

Pen, stat, semi (p) Pen, perf, unconst (p) 



AP= Pm ; — Ei 



H 



(26) 



Pen, perf, unconst (p) 

in the case of error probability. In addition, we are also interested in the corresponding quantities 

for Ffi X ed in d2B: A/f ixed and AP fixed . 

While closed-form expressions for the above metrics seem difficult to obtain across all SNR 

regimes, the following simplifying assumptions render these metrics theoretically tractable. 
• Asymptotics of Antenna Dimension(s): Any performance metric computation in the spa- 
tially correlated, finite antenna setting suffers from fundamental difficulties associated with a 
lack of knowledge of the joint probability density function of singular values of the channel 
matrix. However, under many settings, in the asymptotics of antenna dimension(s), the 
density function of eigenvalues converges (in an appropriate sense) to a certain deterministic 
density function. Many recent works on multi-antenna channels (see [10], [11], [21], [28] 
and references therein) exploit this fundamental property in the characterization of various 
information theoretic quantities of interest. 

In this work, we find it useful to separate our study into two cases: 1) An easily tractable 
case of relative receive antenna asymptotics, where j^- — > 0, and 2) A more difficult case 
of proportional growth of antenna dimensions, where both {M,N r } — > oo with jj- — > 7 
and 7 G (0, 00) is a constant. The first case includes the following sub-cases in a unified 

12 Empirical studies indicate that the correlation coefficient between ^ scheme - ™ an[ ] J scheme 2 (p) is negative. While this claim 

^scheme, 2 \P) "~ 

seems plausible given the reciprocal role of 7 sc heme, 2 (p) in the two terms, we do not have a concrete mathematical proof of this 
claim. If this claim were to be true, we would have AI, < A/.. In any case, it should be clear that AI, and AI, are related 
to each other by an O(l) factor. In Sec. Ivl and IVII we will characterize either coefficient depending on its tractability. 
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way: a) N t and M are finite and N r — > oo, b) {M, N r } — > oo with — > 0, and c) via a 
relabeling of indices the case where jf — > oo with either iV r finite or iV r — > oo. 

• Signaling Constellation: In the error probability case, it will be shown in Sec. [VI] that the 
relative difference metric can be written in terms of the SINR of the individual data- streams. 
Since exact closed-form expressions are known for the SINRs (see ©) of a linear MMSE 
receiver, independent of the signaling constellation, there is no need to constrain the inputs 
to be of any particular type. On the other hand, in the case of mutual information, when 
Gaussian inputs are used for signaling, the average mutual information is given by the well- 
known logdet(-) formula. However, in the non-Gaussian case, closed-form expressions are 
difficult to obtain for mutual information. Thus, we will restrict our attention to average 
relative mutual information loss in the Gaussian case. In the non-Gaussian case, the relative 
MSE enhancement is a good indicatoi0 of the mutual information loss. Besides this, the 
MSE enhancement serves as a soft decision metric when the processed received data is fed 
through more complex, non-linear receiver architectures such as a turbo- or LDPC-decoder. 

• High-SHR Regime: Computing universal upper bounds for the metrics in (T25l) and (T26l) . and 
the corresponding quantities for F fixed , that are tight across the entire SNR range seems to 
be a difficult proposition. However, when the SNR is reasonably high (more precisely, p > 
a a]^m) f° r some suitable a > 1), we will see that considerable simplifications and hence, 
closed-form characterizations are possible. In this SNR regime, the semiunitary precoder 
coincides with the precoder in (1211) as does the performance of another commonly-used 
low-complexity receiver, the zeroforcing receiver. 



V. Mutual Information Loss with Semiunitary Precoding 

In this section, we focus on the (average) relative loss in mutual information with F sem j, 
assuming Gaussian inputs. The difference AJ semi (see (|25l )) can be written as 

En [Iperf, unconst(p) — -fperf, semi(p)] _|_ E H [ip e rf, semi(p) — ^stat, semi(p)] ^7) 
Eh |/stat, semi(p)] Eh [htat, semi(p)] 

V «, ' V v ' 

Ah AI 2 

Since the argument within the expectation of the numerator of AIi is not explicitly dependent 
on the spatial correlation model, it is straightforward to obtain a bound for AJx. 



The mutual information is related to the MSE of the optimal MMSE receiver through the relationship established in [46], 
and not the MSE of the linear MMSE receiver. Despite this difficulty, the MSE enhancement with a linear MMSE receiver is 
a good indicator of mutual information loss in the non-Gaussian case [46]. 
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Proposition 5: If p is such that p > aE-a 



M 



A H (AT) 



for some a > 1, Aii is bounded as 



AJi < 



2M 



H 



1 



A H (M) 



a 2 ^H [istat, semi(p)] 



H 



1 



A H (M) 



(28) 



Proof: See Appendix ■ 
Intuitively, as a and hence the SNR increases, the waterfilling power allocation of the optimal 
precoding scheme converges to uniform power allocation across the M modes (see [10], [11], 
[21] etc.) and thus, A/i decreases. The bound provided in (1281) is not tight since we have not 
characterized the exact probability Pr(riH < M) (in App.0 that determines AI,. But the above 
bound is sufficient to capture the performance loss with uniform power allocation. 

Characterization of AI 2 , which is explicitly dependent on the spatial correlation model, is non- 
trivial. In the following series of theorems, we provide bounds for different correlation models 
and regimes. We first consider the relative antenna asymptotic case. 



A. Separable Model 

Theorem 1: Let the channel H be described by the separable model. From the remark in 
Footnote [[21 A/ 2 is well- approximated by its more tractable version, A/ 2 : 

-'perf, semi -^stat, semi 



Ah 4 K 



H 



For any fixed value of p, A/ 2 is bounded as 



Ah < 



2K, \/S£l(Ar«) 2 1 



M 

E 



1 



lr N r M ^log(l + ^A t (A:))' 

where K\ is a constant determined from an application of Lemma [T3l (in App. |A]). 

Proof: See Appendix [Ql 



(29) 



(30) 



B. Canonical Model 

Theorem 2: Consider the canonical case with — > 0. Using the generalized asymptotic 
eigenvalue characterization in Lemma [T3l (in App. [A]) and following the approach of Theorem [T] 
we have 



M 



2 - 2 \ N M Z-^ 



k=l 



1 



7t,fclog (1 + iilt,k) 



(31) 



for some constant k 2 determined from Lemma [T3J The proof is not provided. 
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C. Special Case: Beamforming 

We now pay attention to the beamforming case (M = 1), the low-complexity of which 
makes it an attractive signaling choice in many wireless standards. While the SNR regime where 
beamforming is capacity-optimal has been established in prior work [10], [11], [21], [45], the 
performance gap between statistical and perfect CSI beamforming is less clear. Using tools from 
eigenvector perturbation theory, introduced in [14], we establish the following results. 

First, note that the term Aii is redundant in the beamforming case. Let I per f(p) and / s tat(p) 
denote the mutual information achievable by beamforming with perfect CSI and statistical 
information alone, respectively. Define the loss term 

a T A Eu ~ J stat(p)] „~ 

Eh Kstat(p)J 

The following discussion complements recent work on the performance gap with the separable 
model [47], that have been established by exploiting some recent advances in random matrix 
theory. Unlike [47] which is based on exact random matrix theory results and is applicable 
only for E [I per f(p) — J s tat(p)] in the separable case, we generalize the results to the canonical 
modeling framework, but do not consider fine refinement of constants in the following results 
for the sake of brevity. 

Proposition 6: There exists a constant k 3 such that A/bf is given by 



N r 



A/ bf < v „ u , „ '-■ (33) 

The constant k 3 is model- (separable or canonical) and regime- (proportional growth or relative 
asymptotics) dependent. Simple bounds for k 3 are as follows: 1) A t (l) ^1 + k 31 ^ N * Nr - j for 
the separable and relative asymptotics case, 2) 7^1 + K% t 2^/N t N r for the canonical and relative 
asymptotics case, 3) — • A t (l) in the proportional growth setting for the separable case, and 
4) n 3ji N r for the canonical case. The constants i = 1, • • • ,4 are independent of N t , N r , S t 
and S r . 

Proof: See Appendix [0 ■ 



D. Proportional Growth of Antenna Dimensions: Separable Case 



Theorem 3: Let H be characterized by the separable model. Let {M, N r } — > 00 with j^- — >• 7 
and 7 G (0,oo). Let the following conditions hold: 1) -^gr = 0(1), 2) 4jm = 3 ) 



MM) = 4) =h = 0{ll and 5) E^a.M = h = c(1)> If p > a M for 
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some a > 1, AI 2 is bounded as 

\og{e/M) + k 4 



Ah < 



log(p/e) + ^ J2k=i lo S 



At(k)A r (k) 

Pc 



k'a + min ( Eh 



log 



A max (H iid A r H 



iid , 



Eh 



log 



A, 



:H iid A 4 H 



G 



M, A t 



(34) 



(35) 



Gm, a,. 

where k' 4 depends only on the constants in the statement of the theorem, and Gm, a. are the 
geometric means of eigenvalues, defined as 

M \ l / M / M \ V M 



G 



M,A r 



G 



M, At 



(36) 



vfc=l 



\k=l 



Proof: See Appendix [0 



E. Discussion 

It is of interest to understand the structure of the scheme that is optimal from a mutual 
information viewpoint for a given channel. While many advances have been made along this 
direction (in particular, regarding the eigenvectors of the optimal input) [10], [11], [21], [23]- 
[32], a complete understanding is rendered difficult by the lack of a comprehensive random 
matrix theory for correlated channels. Theorems [T][2] provide an alternative approach, where we 
characterize the structure of H that is 'best' or 'worst' for a given precoding scheme. 

Let us now freeze A r to be a fixed matrix so as to develop an understanding of the structure 
of A t that minimizes performance loss. Given that a constraint J2f=i ^t(i) = p c has to be met, 
it can be checked that performance loss in (|3Q|) . (|3TT) and (|34|) is minimized by the following 
choice: \ t (l) = ■ ■ ■ = A t (M) = j& and A t (M + 1) = ■ ■ ■ = A t (N t ) = 0. On the other extreme, 
the worst choice of At that maximizes the performance loss is of the form: A t (l) ~ p c and 
At(z) ~ 0,i > 2, but with the added constraint that rank(At) > M. It is important to note that 
the largest gad_j is not achieved when rank(At) = 1. Motivated by Theorem [3l we define a 
matching metric for the transmitter side: 

M 

that captures the closeness of a given channel from the best and worst channels (characterized 
above). As Ait increases, the channel becomes more matched on the transmitter side and the 
performance loss decreases and vice versa. 

14 In fact, if rank(At) = 1, the statistical precoder achieves the same throughput as the optimal precoder. 
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Capturing the impact of A r on performance loss is difficult since A r is hidden in the first-order 
analysis of Theorems [2] and [3l Nevertheless, (|30l) shows that a matching metric for the receiver 
side can be defined as 

.M r ^]T(A r (*)) 2 . (38) 

i=l 

Again, with a constraint ^^A^z) = p c to be met, it can be seen that A4 r is minimized 
by A r = jf- Ijv r and maximized by A r (l) « p c and A r (i) « 0, i > 2, but with the added 
constraint that rank(A r ) > M. It can be seen that the performance loss is not maximized when 
rank(A r ) < M. 

A channel that is matched on both the transmitter and the receiver sides is referred to as 
a matched channel and is optimal for the given precoder structure (fixed choice of M). The 
structure of the matched channel can be summarized as: 1) The rank of A t is M with the 
dominant transmit eigenvalues being well-conditioned, and 2) A r is also well-conditioned. A 
channel that is ill-conditioned on both the transmit and the receive sides such that rank(H) > M 
(with probability 1) is said to be a mismatched channel. 

An interesting consequence of the study in Theorems Q] and [2] is that channel hardening, 
that occurs as N r increases, results in the vanishing of A/ semi . That is, statistical information 
is as good as perfect CSI in the receive antenna asymptotics. This behavior is peculiar of 
this asymptotic regime and will also be observed in the error probability case. The high-SNR 
characterization for signaling with M spatial modes (p > a A ^ M ^ for some a > 1) has also been 
identified in prior work [45]. 



VI. Error Probability Enhancement with Semiunitary Precoding 



In this section, we study the (average) relative error probability enhancement, AP sem i, with 
semiunitary precoding in the high-SNR regime. Towards this goal, we first note that AP sem i 
in (|26l ) can be written_£| as 

\ -f A;, stat, semi (p) Pfs, perf , un 



AR, 



H 



(a) 

< E H 



-Xp) 



J^fc=l Ek, perf, unconst \P) 



1 M 

-■Y 



k=l 



Ek, stat, semi (p) Pk, perf, unconst (p) 
Pk, perf, unconst (p) 



(39) 
(40) 



where (a) follows from Lemma [9] 



15 Note that AP se mi is independent of how error probability is defined: Averaged across data-streams or at least one data-stream 
in error. 
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Proposition 7: The loss term, AP semi , can be bounded as 



AP semi < E\ 



H 



1 M 

M ' S 



, /3 2 ASINR,. 
exp ( - — 2 — 



1 + 



ASINRi. 



SINR fc , stat , 



fc=l 



1 - 



/3 2 SINR fciPerf , un 



(41) 



where 

ASINRfc = SINRfcperfunconst — SINRfc statj ; 



1 | A wf (A:)A fc (A f Hff d A. r H,i d ) det + ifc • A f 1/2 H^ d A r H iid A t 1/2 



Pc 



See notations established in Sec. HV-Al 



det (Im.! + jfc ■ A 1 / 2 H? d A r H iid A]' 2 



Proof: See Appendix [Gj ■ 
As in Sec. |Vj we consider the separable and canonical models for the relative antenna 
asymptotic case separately. 



A. Separable Model 



Theorem 4: In the separable case, if p > a A ^ M ^ for some a > 1, AP semi can be bounded as 



AR f 



< 







k=l M 7 



+ 



M 



E 



- + — 

a ar 



i 



A H (M) 



V 



( E _A H (M). ) 



+ — o 

It 



f N t + VM 



\ 



/ 



(42) 



Thus the dominant term of AP semi in the relative antenna asymptotics and large a is of the form: 

jl sr M j_ o2 E^LiAt(fc) 

/32 p ■ l^k=l A t (k) ~T~ V A t (M) ■ 

Proof: See Appendix [H] ■ 



5. Canonical Model 

We characterize AP 2 , the performance gap between the statistical and perfect CSI semiu- 
nitary precoders, alone for the sake of simplicity. Along the development of Theorem |4l it is 
straightforward to extend this result to AP semi . 
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Theorem 5: Let p > a,-^— = a „ A/ 2 . The dominant term of AP 2 is bounded as 

r — 7t,M Hi a iM 



M 



m <_ + I, E ^.£%- ^™ (43) 



2a M P 2 pf^lt,k 2 7r M y JW r 

Proof: The proof follows along the same lines as Theorem |4] by applying the second part 
of Lemma [T3l (see App. El). No explicit proof is provided. ■ 

C. Special Case: Beamforming 

In the beamforming setting, our earlier work [14], [48] leverages advances in eigenvector 
perturbation theory to provide bounds on AP bf , the gap in performance between statistical and 
perfect CSI beamforming. These results are summarized in the following lemmas. 

Lemma 3: Let H be described by the separable model. Assume that A t (l) > A t (2) ^1 + - 2 N ^ 
for some rj > 0. There exists a constant K\ such that 



4JV<«i#.« (45) 
Gap 4 j r y N r 

where fi rj 2 corresponds to the second moment of the receive eigen-modes and Gap t corresponds 
to the separation between the transmit eigen-modes, and are defined as 

A*r,2- ^ , Gap.-l-^. (46) 

■ 

Lemma 4: Let H be described by the canonical model. If ^jf- > j^- + for some r] > 0, 
there exists a constant iY 2 such that 



where Gap£ and 2 are defined as 



I ^ /V 2 V a 2 a 2 

Ga P ^-^-V ^ ^ 4 2^_i. (48) 

^ - 1 ^ (7t,i - 7t,fc) J>1 



Thus in the asymptotics of N r relative to N t , even channel statistical information is sufficient 
for near-perfect CSI performance. Further, given a fixed N t and N r , ill-conditioning of S t and 
well-conditioning of S r reduces AP bf . We also provided evidence in [14], [48] that, of these 
two factors, the conditioning of T, t is more critical than that of S r . Theorems |4]|5] provide a 
multi-mode generalization of these results. 
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D. Discussion 

As in the mutual information case, we are interested in channels that minimize and maximize 
the performance loss AP sem ;. From (|42|) and (l44j) . it is observed that the choice of A t that 
minimizes performance loss is such that: 1) It minimizes ;^pjy, 1 < k < M, and 2) It also 
minimizes Y^k=i a^a-) - Both of these constraints are met by a channel that maximizes A4 t (as 
defined in (1371) for the mutual information case). That is, a channel that is matched on the 
transmitter side from a mutual information viewpoint is also matched on the transmitter side 
from an error probability viewpoint. However, it is difficult to make similar conclusions about 
matching on the receiver side. 

On the other hand, note that as the constellation size increases, /3 decreases. Thus, for any 
fixed p, the first dominant term of AP semi in (l42l) and (l44l) increases as the constellation size 
increases, whereas the second term decreases. The tension between the two dominant terms 
determines the optimal choice of constellation to use at a fixed SNR over a given channel. In the 
extreme case of asymptotically high SNR, the first term vanishes and AP semi is minimized with 
the largest constellation available in the signaling set. The optimality of a larger constellation 
at high-SNR from an error probability viewpoint is to be intuitively expected. Further, as in the 
mutual information case, channel hardening results in vanishing AP sem j as N r increases. In the 
more realistic case of proportional growth of antenna dimensions, it is difficult to establish that 
ASINR fc — > as p — > oo. We postpone the study of this case to future work. 



VII. MSE Enhancement with Statistical Precoding 
We finally consider the (average) relative MSE enhancement. Define AMSE as 



AMSE 4 — P H 

M 



' M 

E 

,fc=i 



MSE 



fc, stat, semi 



MSE 



k, perf, unconst 



MSE 



k, perf, unconst 



(49) 



The following proposition establishes the trend of AMSE under certain settings. 



Proposition 8: In the receive antenna asymptotics case, if p > a A ^/ M s , AMSE is bounded as 



AMSE M M , n 
< — + — -0 



M+JN t 



M 



As SNR increases, the dominant term of AMSE is 



+ mE 



A t (fc) (A wf (A) - fg) 



k=l 



pAt(k) 
M 



(50) 



M 

AMSE < — • O 



nsfi 



(51) 
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Proof: Note that MSE fe] . is denned as MSE fc . = 1+S |^ Rfc and hence, we have 

M 

AMSE = J2 E ^ 

k=l 

Following (11301) and (11311) in Appendix El (l50l) follows immediately in the receive antenna 
asymptotics case. ■ 
While we expect AMSE — > in the proportional growth case also, we do not have a 
mathematical proof of this fact. This will be addressed in future work. 



ASINR fc 

1 + SI NRt st at, semi 



(52) 



VIII. Numerical Studies 

In this section, we illustrate the results established in this paper via some numerical studies. 
We consider 4x4 channels for our study where M = 2 data-streams are excited with: 1) 
Gaussian inputs for the mutual information case, and 2) QPSK inputs for the error probability 
case. In all the cases, the channel power is normalized to N t N r = 16. 




-25 -20 -15 -10 -5 5 10 15 20 25 



SNR (dB) 



Fig. 1. Mutual information of the perfect CSI and the statistical semiunitary precoders over matched and mismatched channels. 



Matched vs. Mismatched Channels: The first study illustrates the performance of statistical 
semiunitary precoding over matched and mismatched channels. We consider a 4 x 4 matched 
channel with normalized separable model, where diag(A t ) = [8 8 0]. The mismatched 
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channel is characterized by diag(A t ) = [4 4 4 4]. In both the cases, A r = 4I 4 . Fig. Q] shows 
the average mutual information with perfect CSI and statistical semiunitary precoding in 
the two channels. 

As explained before, the mutual information in the four cases are given by: 



' matched, perf 



(p) 



-'matched, stat (p) 
-'mismatched, perf (p) 



' mismatched, stat 



(P) 



E 



E 



E 



M M 



/ 

$>g(l + 

i=i K - 

M 

i=l 

M 

^log^ + ^A^H^ 



_P_ 

M 



(53) 
(54) 
(55) 



where H-m and are N r x M and N r x N t i.i.d. matrices. As can be seen from (|53l) . 
(l55l) and Fig. [T} the performance of the mismatched statistical precoder is 10 log 10 (jt.) m 3 
dB away from both the matched precoders. It is also surprising that the matched precoders 
have nearly the same performance as the mismatched (i.i.d. channel) optimal precoder. 
This seems to be related to the choice of N t , N r , M and eigen-properties of i.i.d. random 
matrices. 
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Fig. 2. Gap in performance between statistical and perfect CSI semiunitary precoding as a function of the matching metric, 
M t : (a) Mutual information and (b) Error probability. 



• Performance Gap as a Function of Matching Metric: The second study focuses on the 
gap in performance between the perfect CSI and the statistical precoders, as a function of 
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the degree of matching of the channel to the precoder structure. We consider 4x4 channels 
with M = 2, and freeze U t , U r to some arbitrary choice in our study. We also freeze A r to 
4I 4 so as to focus on the impact of matching on the transmitter side. Note that the matching 
metric (defined in Sec. IV-EK Ai t = Yih=i A-t{k), takes values in the range (0,64] in our 
setting. A family of ~1700 channels (each characterized uniquely by A t (/c), k = 1, • • • , N t ) 
is generated such that J^fc=i A. t (k) = p c = 16 and Ai t takes values over its range. The 
channels become more matched (on the transmitter side) to the precoder structure as Ait 
increases. 

While much of our study in the preceding sections is based on asymptotic random matrix 
theory, Fig. [2] illustrates that the notion of matched channels developed in this work is useful 
in characterizing performance, even in practically relevant regimes like 4x4 channels. 
Fig (2a) illustrates that A/ sem j decreases as the channel becomes more matched on the 
transmitter side for three choices of p, whereas Fig (2b) illustrates the same trend for 
AP sem j. Note that for a given channel as p increases, A/ sem j decreases whereas AP sem i 
increases. This is because of the contrasting behaviors of J s tat, semi (p) and P er r,perf,unconst(p) 
as p increases. 

It is important to note the following. In general, there exists no ordering relationship 
between any two matrix channels [49]. Nevertheless, Fig. [2] shows that the relative (mutual 
information or error probability) performance of two channels can be compared by using 
Ait and M. r . A channel that is more matched leads to a smaller value of A/., as well as 
AP. for any fixed SNR. 

• Asymptotic Optimality: The third study illustrates the asymptotic optimality of statistical 
precoding. Fig [3] plots A/ semi and AP semi as a function of N r with N t and M fixed at N t = 4 
and M = 2. The channels have separable correlation with A t = I4 whereas A r = Ijv r and 
hence, p c = 4 for all the channels. As can be seen from the study in the previous sections 
as well as the figures, channel hardening, where the eigenvectors of H H H converge to the 
eigenvectors of £ t = E[H H H] as g -> ensures that even channel statistical information 
is as good as perfect CSI with respect to performance. 

• Low- and Medium-SHR Regimes: The last study of this section studies the mutual infor- 
mation performance of a statistical precoder in (|21~T) when compared with a semiunitary 
precoder in the low- and the medium-SNR regimes. In the high-SNR regime, the optimal 
perfect CSI precoder excites the M modes uniformly with equal power. However, in the 
low-SNR regime, the perfect CSI precoder allocates power to the transmit eigen-modes non- 
uniformly. The precoder structure in (|2lT) excites the M — 2 modes with power proportional 
to the transmit eigenvalues and hence, performs better than the semiunitary precoder. FigBJa) 
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Fig. 3. Asymptotic optimality of the statistical semiunitary precoder for fixed N t = 4, M = 2 as N r increases: (a) Mutual 
information and (b) Error probability. 
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Fig. 4. Low- and medium-SNR mutual information performance of the statistical precoder in l !2 1 l i when compared with the 
semiunitary precoder for a) separable and b) non-separable (canonical) models. 



shows the performance of the statistical precoder in a channel with separable correlation, 
while Fig. HJb) corresponds to a channel with non-separable correlation. In the separable 
case, the transmit and the receive eigenvalues are given by diag(A 4 ) = [9.80 5.66 0.45 0.09] 
and diag(A r ) = [8.58 4.20 1.98 1.24] whereas in the canonical case the variance matrix, 
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M = (of ), is given by 



1.66 



0.31 



1.71 0.31 



2.24 



0.18 



0.15 0.54 



M 



(56) 



1.97 



1.46 



0.70 0.28 



1.65 



1.65 



0.49 0.71 



It is interesting to note that the perfect CSI semiunitary precoder may either perform better 
or worse than that of the precoder in (l2l"l) . Future work will look at this aspect more 
carefully. 



The main focus of this work is on precoding for spatially correlated multi-antenna channels 
that are often encountered in practice. Motivated and inspired by many recent wireless stan- 
dardization efforts, we proposed low-complexity structured precoding techniques in this paper. 
Here, the eigen-modes of the precoder are chosen to be the dominant eigenvectors of the transmit 
covariance matrix, whereas the power allocation across the excited modes are obtained via certain 
simple, low-complexity methods. A special case of structured precoder is a semiunitary precoder, 
where the spatial modes are excited with uniform power. 

In this work, we first established the structure of the optimal perfect CSI structured precoder 
and showed that it naturally extends the channel diagonalizing architecture of the perfect CSI 
unconstrained precoder. We motivated the need for a relative difference metric that captures the 
impact of lack of perfect CSI on the precoder performance, independent of the operating SNR. 
We then analytically characterized the average relative mutual information loss (as well as the 
average relative uncoded error probability enhancement) of the statistical semiunitary precoder 
using tools from random matrix and eigenvector perturbation theories. 

Our results show that given a precoder architecture (that is, fixed antenna dimensions and 
precoder rank), the relative difference metrics are minimized by a channel that is matched to 
it. A matched channel is one that has: 1) The same number of dominant transmit eigen-modes 
as the precoder rank, and 2) The dominant transmit as well as the receive eigen-modes that 
are well-conditioned. Our theoretical study also characterizes matching metrics that enable the 
comparison of two channels with respect to performance loss captured by the relative difference 
metrics. In particular, as the channel becomes more matched to the precoder structure and the 
matching metrics change accordingly continuously, the performance loss decreases monotonically 
and vice versa. Numerical studies are provided to illustrate our results. 

Our work is a first attempt to analytically study the performance of low-complexity statistical 
precoding with respect to a perfect CSI benchmark. Much of this study has been rendered 



IX. Concluding Remarks 
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possible due to substantial advances in capturing the eigen-properties of random matrices with 
independent entries. Nevertheless, there exist many directions along which this work can be 
developed. We now list a few of these directions. 

This work is limited to the high-SNR, large antenna asymptotic regime where a comprehensive 
random matrix theory is available to capture precoder performance [50]. Even in this regime, it 
may be possible as in [47] to refine the constants in the bounds for the relative loss terms and 
obtain further insights on the impact of spatial correlation on performance loss. Besides that, in 
the case of proportional growth of antenna dimensions with a non- separable correlation model, 
both mutual information as well as error probability have not been characterized completely in 
this work. Lack of availability of closed-form mutual information expressions for non-Gaussian 
inputs limits the development of this work. The notion of precoder-channel matching introduced 
in this work can be developed further to aid in the design of low-complexity, structured and 
adaptive signaling schemes. In the case of mismatched channels, the construction of limited 
feedback schemes to bridge the gap in performance has been undertaken in [13], [51], [52]. The 
question of trade-offs between spatial versus spatio-temporal precoding [53] and extensions to 
more general Ricean fading [54], multi-user [55], wideband [56] systems are also of interest. 



Appendix 

A. Key Mathematical Results 

We now introduce some key mathematical results that will be needed in the ensuing proofs. 
Majorization Theory: We start with a few results from majorization theorv [49]. 

Definition 1: Let a and b be two vectors in R m in non-increasing ordeO i.e., a(l) > • • - > 
a(m) and b(l) > • • • > b(m). Then a is majorized by b (denoted by a -< b) if 

k k 

J^a(z) < ^b(i), 1 < A; < m (57) 

i=l i=l 

with equality if k — m. ■ 
Remark 1: For example, if m = 3, any positive vector a such that Y^i=i a (*) = 1 satisfies the 
following majorization relationship: 

aiow -< a -< ahigh (58) 

where a| ow = [~ | |] and a high = [1 0]. Another example of a majorization relationship is 
provided by an m x m Hermitian matrix X, with m-dimensional vectors e and d denoting the 



The non-increasing order for vectors results in ambiguity in a majorization relationship. To resolve this, in this section, we 
will assume that any two comparable vectors are always in the non-increasing order. 
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eigenvalues and diagonal entries of X, respectively. We have d -< e. From the definition, it can 
also be easily checked that if a -< b, then —a -< —h. 

Lemma 5: A matrix Q is said to be unitary- stochastic if there exists a unitary matrix T such 
that Q(i,j) = |r(z,j)| 2 [49, Sec. 2B.5, p. 23]. By definition, a unitary- stochastic matrix is 
doubly stochastic. If u -< v, there exists a unitary-stochastic matrix Q such that u = vQ. ■ 

Definition 2: Let a and b be two vectors in R m in non-increasing order. Then a is weakly 
submajorized by b (denoted by a -< w b) if 

k k 

a W ^ Yl b( ^' 1 - k - m - (59) 

i=i i=i 

If the inequality is in the opposite direction in (l59l , then a is weakly supermajorized by b and 
is denoted by a -< w b. Note that if a -< w b, then b ^ w a and vice versa. ■ 
Lemma 6: A vector a is submajorized by b if and only if 5^</(a(i)) < J2g(b(i)) for all 
continuous, increasing convex functions g : R i— > R. For supermajorization, replace <?(•) by all 
continuous, decreasing convex functions. If g(-) is decreasing, convex and a -< w b, we have 

b(a(l)) ••• <?(a(m))] -< w [g{b{\)) ■■■ <?(b(m))] . (60) 
Proof: See [49, p. 10] for the first statement. For the second, see [49, p. 116]. ■ 

Definition 3: A function / : A i— > R with A C M. m is said to be Schur-concave on A if 
{a, b} G A and a -< b implies that /(a) > /(b). If however, /(a) < /(b) for all such a and 
b, /(■) is said to be Schur-convex on A. If a function is Schur-concave (or -convex) over R m , 
we just say that it is Schur-concave (or -convex). Note that /(•) is Schur-concave if and only if 
— /(■) is Schur-convex. ■ 

Remark 2: An example of Schur-convex and Schur-concave functions is as follows. Let x = 
[xi " ' x m ] with Xi > x i+ \. Consider the weighted arithmetic mean of {xj} given by /(x) = 
Y^iLi w i x i- The function /(■) is Schur-convex if Wi > and W\ < ■ ■ ■ < w m . If Wi > 0, but 
are in the reverse order, then /(•) is Schur-concave. See [9, Lemma 4] for proof of this claim. It 
is important to note that the sets of Schur-concave and Schur-convex functions neither partition 
nor cover the space of all functions, nor are they disjoint. 

Lemma 7: Let / : R i— > R be a continuous convex function. Then, Y^iLi /( x «) ^ s Schur-convex. 
That is, if u and v are two mxl vectors such that u -< v, then, YlT=i Z( U W) — YllLi /( v (0)- 
Let 4> : R m h R be Schur-convex and the univariate function </>(■■•, ■ ■ ■ ) : R i— > R be 
monotonically decreasing for all i. If a -< w b, we have 0(a) < 0(b). 

Proof: See [49, p. 11] for the first statement and [49, p. 59] for the second. ■ 

Lemma 8: Let / : R i— > R be a continuous convex function. Then, max i=1 ... m f(xi) is 
continuous and Schur-convex. 
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Proof: A composition of an increasing, Schur-convex function with a convex function 
results in a Schur-convex function [49, p. 63]. The proof follows by noting that maXjXj is a 
function that is increasing in its arguments and is Schur-convex. ■ 
Lemma 9: Let {xi, % — 1, • • • , K} and {y i: % — 1, • • • , K} be two if-tuples such that {xi, yi} > 
for all i. Then, 

Proof: We prove the lemma by induction. Consider the case K = 2. Without loss of 
generality, let x 1 < x 2 and y\ > y 2 . We therefore have ^ < || which implies that 

Xi x 2 

x 1 +x 2 < — y 2 ^ V\- (62) 

yi 2/2 

Adding x\ -\-x 2 on both sides and rearranging, we see that the statement is true for K = 2. Let the 
statement be true for K = n — 1 for any ordering where x-i < ■ ■ ■ < x n _i and y 1 > ■ ■ ■ > y n -\. 
We will show that the statement is true for the K = n case, where we augment the (n— l)-tuples 
with x n and y n . Without loss of generality, we can assume that x\ < ■ ■ ■ < x n and y± > ■ • • > y n 
after possible rearrangement and relabeling of indices. We have 

n— 1 / \ /n—1 \ / n— 1 



(a) 



V Xi + X n < — Vi + — Vn (63) 

(b) 

1 /n—1 \ /n—1 \ -. 

(64) 




M— 1 / -, \ \ / v— >n- 1 \ \ /n—1 



rr' % \ n-1 / \^ v, ) y n \ ^— ' / 



, != 1 ^ 2/n y V n-1 y y-^y.j y n 



where (a) follows from the induction hypothesis and (b) by breaking the sum into two pieces. 
The statement holds for K = n upon rearrangement after using the increasing and decreasing 
ordering assumption of Xi and y-i, respectively. ■ 
Matrix Theory: The Poincare separation theorem connects the eigenvalues of semiunitary trans- 
formations with those of the transformed matrix [57, Cor. 4.3.16, p. 190]. 

Lemma 10: Let A be an n x n Hermitian matrix. Let r be such that 1 < r < n and 
let Wi, • • • ,w r be a set of orthonormal vectors in C n . Define B = W F AW where W = 
[wi • • • w r ]. Let the eigenvalues of A and B be arranged in non-increasing order. Then, we 
have Afc(B) < A fc (A) for all k = 1, • • • , r. ■ 
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The following lemma provides bounds for eigenvalues of sums and products of Hermitian 
matrices [57]. 

Lemma 11: If A and B are n x n Hermitian matrices, then 

A fe (A)A min (B) < A fc (AB) < A fc (A)A max (B), k = 1, • • • , n, (65) 
A fc (A) + A min (B) <A fc (A + B)< A fc (A) + A max (B), k — 1, • • • , n. (66) 

We also have 

n n 

Afc(AB) < MA) A fc (B). (67) 

k=l k=l 

U 

The following lemma [58] helps in computing the determinant of partitioned matrices. 
Lemma 12: If X, Y, Z and W are n x n matrices and W is invertible, we have 



det 



X Y 
Z W 



det(X - YW^Z) • det(W). (68) 



Random Matrix Theory: We now characterize the eigenvalues of certain families of random 
matrices. 

Lemma 13: Let Xbeapxn complex random matrix with i.i.d. entries of mean zero, common 
variance 1 and a finite fourth moment. Consider two cases: 1) p is finite and n — > oo, and 2) 
{p, n} — > oo with p/n — > 0. In either case, in the asymptotics of n, the empirical eigenvalue 
distribution of x *^ / _I lIp converges pointwise with probability 1 to the semi-circular law F(x) 
where, 

if x < -1, 



F{x) = <j /; =1 l^l^&y if - 1 < x < 1, (69) 
1 if x > 1. 



In particular, with probability one, we have 



1 - 2 Ji < liminf A -' XX "» < limsup < 1 + 2#. 



(70) 



Let A be an n x n positive definite diagonal matrix. Under the same assumptions on X,p, n as 
above, there exists a finite constant 71 > (dependent on p and n only through A) such that, 
with probability 1 

E<A(fl . A min (XAX g ) A max (XAX g ) ^.A(i) /p 
— 7i W — < hm ml < hm sup < =^ h 71 1 / — . 
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On the other hand, let X be a p x n complex random matrix with independent entries from a 
fixed probability space such that X(z, j) is zero mean, has variance afj and 

supmax£[|X(2,j)| 4 ] < 72 < oo. (71) 

n,p % 3 

Also, without loss of generality, assume that { YTj=\ a % } are arranged in decreasing order. Then 
there exists a finite constant 73 > (independent of p, n) such that, for all i 

_ 731 /p< liminf M55!) < limsup M2^<?k4 + 73 ,/Z (72) 



n v n " n n n n y n 

with probability 1. 

Proof: We provide an elementary proof of the claim when p is finite, n — > 00 and X(i, j) 
are standard, complex Gaussian. Define the set A n = |u; : Amax ( x (^) AX (") ) > \ _|_ 6l _)_ g 2 j 
If we can show that J2 n P r (^«) < 00 > ^ follows from the Borel-Cantelli lemma [59] that 
Pr (limsup A„) = 0. By choosing e\ and e 2 appropriately (as a function of n), we can establish 
strict bounds on the eigenvalues. 

Breaking XAX H into a diagonal component and an off-diagonal component and using Lemma ITTl 
it follows via a union bound that 

Pr (An) < pPr p^TO.OP-DAM > \ + 2pr /|E^,X(M)A(QX(2,0-| > ^ 
\ n / \ n 

Using a Chernoff-type bound [59], we have the following: 

e\n 2 \ 2 / e 2 n 2 c 



Pr(40 < pexp ^ 2£Li(A(;)p j + 2? exp ^-^j (73) 

for some c > 0. The smallest value of e\ and e 2 that can still result in Pr (limsup A^) = is 
such that 



( , <0( r 2 ! = \/ SkiMl! . 1 v > 0. (74) 



Letting | 0, we have 



UXAX^ < EkM + JS^^ . 1 (75 ) 
n n V n Jn 



where 74 > is a constant independent of p and n. The expression for A min (-) is symmetric 
with that of A max (-) and can be obtained similarly. The extension to the case where X has only 
independent entries (not necessarily complex Gaussian) also proceeds via the same logic. 

Since p — ► 00 in Case 2), the above technique is not useful in establishing the claim of the 
lemma. Here, the result follows from [60], [61, Theorem 2.9, p. 623]. The generalizations with 
A and independent entries follow via the same proof technique as in [60] and hence no proofs 
are provided. The readers are referred to [61] for a brief summary of the general technique. ■ 
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B. Proofs of Prop. \M 

Proof of Prop. [/} Let F be a fixed N t x M semiunitary precoder and define 

B 4 (l M + A F ^ H h Hf) 1 . (76) 

From ([8]), note that the vector MSE is the vector of diagonal entries of B. Following Lemma ITOl 
we have X k (F H H H HF) < X k (H H H) for k = 1, • • • , M. That is, the eigenvalues of B satisfy 

MB)> 1 + £A ^ i(H „ H) .^l,...M. (77, 

Denote by Ab the vector of eigenvalues of B. The Schur-concavity of /(•) and the fact that 
the diagonal entries of a Hermitian matrix are majorized by its eigenvalues when used with B 
results in / (MSE) > / (Ab). The monotonicity of /(■) when combined with (1771) implies that 

1 



/(MSE)>/ 



(78) 



' l + ^A M - fc+ i(H*H)' 
Note that the lower bound in (|78l) is independent of the choice of F, and hence, also serves as a 
universal lower bound. Furthermore, the choice of F in © meets the lower bound and is hence 
optimal. ■ 
Proof of Prop. ^ Let F be a fixed semiunitary matrix. Define the M x 1 vectors d and e 
with d(A;) 4 B(k), where B = (l M + ^F^H^HF)" 1 and e(A;) 4 ± Y^i i +1 ^(fW*hf) > 
respectively. Note that e(k) is equal for all k and hence, from Remark CD we have d y e. From 
Lemma [71 we have that YukLi ^(') ^ s Schur-convex. Hence, 

M M 

J2h{d{k)) > ^/i(e(Jfe)) = M/i(e(l)). (79) 

fc=i fc=i 

Using Lemma [TOl and the increasing property of h(-), we have 

g MdW) > M ^_g ___ j. (80) 

Since the right-hand side of (1801) is independent of the choice of F, it serves as a lower bound 
on the error probability. 

Our goal is to show that the lower bound can be achieved and the choice of F that leads 
to the lower bound is F opt . For this, let A be defined as A = -jj YldLi 1+ _ g _ A 1 ( H g H ) - Further, 
define the two M x 1 vectors u and v such that u(k) = A for all k and v(fc) = 1+ _p_ A ^ HHH ^ . 
Since u -< v, from Lemma [51 there exists a unitary- stochastic matrix Q such that u = vQ with 
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Q(i,j) = \T(i,j)\ 2 for some T unitary. Consider the precoder F as given in (fl"2l) . The MSE 
across the data-streams with this precoder is given by 



MSEj 



P 



H"HF 









r 








k 





(81) 
(82) 



with A(fc) = Afc(H^H). From the definitions of T, v and the relationship u = vQ, it is easy 
to check that MSE^ = A for all k. Thus, with the choice of F as in (fl"2)) . we can achieve the 
lower bound in (f8~0l) . ■ 
Proof of Prop. & For the Schur-concave case, from Lemma [10] and (1671) . it can be checked 
that a -< w b, where a(k) = A fc (Af ixed V|f H H H V F ) and b(k) = A fixed (&;) X k (H H H). Define 
d(y) = f° r some fixed k > and note that g(-) is convex and decreasing. Thus, from 
Lemma [6] we have g(b) -< w g(a). Noting that — /(•) is Schur-convex and decreasing, from 
Lemma [7] we have / (<?(a)) > / (g(h)). This universal lower bound is achievable by F opt as 
in £T7]). 

When /(•) is Schur-convex, we proceed similar to the semiunitary case. Using g(y) = j^j, 
from Lemma [6] we have 

M M 

X)^b(*))<X>(a(*)). (83) 



Define u(k) = 



M 



k=i 
i 



k=l 



for all k and w(k) 



that u -<; w. That is, there exists a unitary- stochastic Q such that u = vQ. The result follows 
as before. ■ 



C. Proof of Proposition \5\ 

To characterize the behavior of Aii, recall the structure of the optimal semiunitary precoder 
from Prop. Q] and note from Lemma [2] that the perfect CSI unconstrained scheme corresponds 
to waterfilling along the first M dominant transmit singular vectors. Thus, we have 

£>g(l + A H (z)Awf(0) - $>g (l + ^ A H (i)) (-84) 



A/l • E H [/stat, semi(p)] = #H 



i=l i=l 

A 



where for each realization H, n H modes are excited (1 < n H < M) with power A w f(z 

(^ H — Ah(0 ) and ^ e water l eve l Z 1 * 1 * s chosen such that XT=i Awf (*) = P- It can be easily 
checked that A w f(?) can be written as 

AwfO) = — + — -r-rs ~ wv (85) 
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and Tin is the largest value of k that satisfies: 



E Ah(») - A H (fc) 
;i A h «Ah(A:) ~ P - 



(86) 



Hence, we have 

All • #H [istat, semi(p)] < #H 



n H 



pA H (t)(M-n H ) _ i , A H (i) V^«H 1 

-'- l^j I A„0) 



pA H (i) 



(87) 



.i=l \ 1 A/ 

Using the fact that log(l + x) < x for all x > — 1, after some simplifications we can further 
upper bound Aii as 



M 2 

Aii • Eh [i stat , semi (p)] < £ H [M - n H ] + -r- ■ E u 



p< 



M 



r— 

Ah i 



1=1 



(88) 



From (|86l) . it is easily recognized that if p > 



then n H > fc. Thus, if p > ai?n 



M 



A H (A) 



Eti and in Particular, if p > 



for some a > 1 as in the statement of the theorem, 



_Ah(M)_ 

both the terms in (1881) can be bounded by constants that depend only on the channel statistics. 
For this note that, 

M 



E u [M - n H ] < M- Pr(n H < M) < M ■ Pr 



A H (M) 



> P 



< M-Pr 



1 



A H (M) 



A H (M) 



(«) M 
< — 
" a 2 



E 



A H (M) 



E 



2 ! 



(89) 



(90) 



_A H (M)_ 

where (a) follows from Chebyshev's inequality. A trivial upper bound for the other term gives 
the desired result. ■ 



D. Proof of Theorem [7] 

It can be checked that Ai 2 can be written as 

££lilog(l + £A fc (H"H)) 



Ai 2 



E 



H 



(a) 

< En 



Er=ilog(l + ^A fc (F s ^ mi H^HF semi )) 



-Y 



(6) 1 



log(l + ^A fc (H^H)) 
M fe lo S (i + irMF^H^HF^)) 

log (l + ^A fe (A t H? d A r H iid ) 



M^ 4 



fe=i 



log(l + ^A fe (A t Hf d A r H iid ; 



(91) 
(92) 

(93) 



where (a) follows from Lemma [91 and (b) from the notations established in Sec. IIV-A1 
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Using Lemmas [TT] and [T_3l we have the following in the limit of N r , N t , M: 



M 



A/, < 



M 



k=l 



M 



l0 § ( 1 + lfe A *( fc )^max(H^ d A r Hiid 



log (l + ^A t (A;)A min (H? d A r Hiid; 



< 



k=l 



log(l + ^A lW (l + Kl ^SfH 



log | ! + [ 



M 



fc=i 



1 



log(l + &A t (fc)) 



(94) 



(95) 



(96) 



where K\ is the constant from an application of Lemma [T3] in this setting. The last inequality 
follows by using the log-inequality and some trivial manipulations. The proof is complete. ■ 



E. Proof of Proposition |6| 

We have the following well-known facts [42]: 



/ perf (p) = log (1 + pA max (H H H)), J stat (p) = log (l + p A fc |vf u 5tat A , (97) 

^ k=i ' 

where u stat is an eigenvector corresponding to the dominant eigenvalue of S t = i?[H^H], 

and an eigen-decomposition of H^H is of the form: H H H = J2k=i ^fc v fe v ^ • The following 

simplifications can then be made: 

_ J2k ^|vf u stat | 2 



En [/stat(p)] • A/ bf = £ H 



log 1 + p 



1 +PEfc^l V f U stat| 



(«) 



< ^[log^+pA^l-lvfUstatl 2 ))] 



(«0 



< log (1 + p£ H [A max (H^H)(l - |vfu stat | 2 )]) 



(98) 

(99) 
(100) 



(c) 



< log 1+p- a/^h[(1- |vfu stat 



12121 



'Eh. [ALx(H^H)] (101) 



where (a) follows trivially by ignoring the contribution of k = 2, • • • , Nt in the summation, 
(b) follows from Jensen's inequality, and (c) from Cauchy-Schwarz inequality. We use the 
eigenvector perturbation theory developed in [14] and in particular, the bound in [14, Eqn. (16)] 
to establish that 

, N t \og(N r ) 



E H [(1 - |vf u stat | 2 ) 2 ] < ^ 



N r 



(102) 



for some appropriate constant k' 3 that is independent of the channel statistics and dimensions. 
Using Lemma ITTI and Lemma [T3l the conclusion in (1331) follows for the relative asymptotics case. 
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For the proportional growth case, an upper bound needs to be established for £" H [^max(H H H)] . 
See [62] for an upper bound technique that builds on the work by [63], which results in the 
statement of the theorem. ■ 



F. Proof of Theorem \3\ 

As in App. |Dl we can write AJ 2 as 

-5H[-/perf,semi(p)] _ j 
-Ek [-f stat, semi (p)] 



Eh 


£f =1 log(l + £A fc (H*H))l 




En 




Li log (l + ^A fc (A,H^ d A,Hiid))^ 



1. 



(103) 



(104) 



The denominator of (11041 ) can be computed following the method in [50, Theorem 1] and equals 

^H[/stat,semi(p)] = ^ log ( 1 + ~ fnA t (k) ) + ^ log ( 1 + — jUiA r (&) J - — ^1, (105) 

k=l V Pc / k=1 \ Pc J Pc 



where pi and pi satisfy the recursive equations 



1 M A 

1 v - A 



A r (k) 



Pi 



A t (k) 

Pi-At(k) 



(106) 



k = l Pc^ 1 n ^ k = l Pc 

A simple lower bound for -E , H[^stat,semi(p)] is obtained by using log(l + x) > log(a;) for x > 0: 



£H^stat,semi(p)] > Yjlog I — A t (k) A r (k) 



(107) 



We now establish that the above bound is order-optimal as a increases (with p = a A ™ M ^ ), by 
lower bounding [i\fi\. We can easily show that 

Pc h 



Pi 



> 



M 1 + 



A r (l) 
A t (M) 



Pi > 



Pc 



M 1 + gh A *(!) 
1 ^ aU2 A t (M) 



and hence, 



where C\ = b 



p ~ olC\ 
1 > — pipi > — — — — — , 
Pc 1 + ol[C x + C 2 ) 



(108) 



(109) 



A r (M) 
A t (M) 



and Co 



Mi) 

■At(M) 



. Tightness of the bound in (11071) follows from using the 



fact that log(l + x) < log(x) + \ , x > 0. 
Combining the above relationships, we have 

paC\ 



^H[/stat,semi(p)] > M log 



e (1 + a(d + C 2 )) 



\ M ( 

' k=l ^ 



A t (fc)A r (fc) 

Pc 



(110) 
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Proceeding in the same way, one can obtain an upper bound for -E , H[^ P erf,semi(p)]- Since the main 
goal here is to obtain the trends of AI 2 , we find it convenient and less cumbersome to replace 
the upper bound with an approximation (log(l + x) ~ log(x)) by ignoring the term that decays 
as -. Thus, we have 



£h[/, 



perf , semi 



(P)] 



M 



k=l 



Afc(AiH^A r H iid ^ 



(a) 



A 



B 



< Mlog(^) +mm(A,B) 



(111) 

(112) 



ME 



ME] 



H 



log 



log 



Amax(Hiic)A t H iid ] 



M 



+ ^log(A t (fc)) (H3) 



Pc 



k=l 
M 



+ ^log(A r (A;)), (114) 



k=l 



where in (a) we have used Lemma [TOj Combining (II 101) and (II 121) . we have the statement of 
the theorem. ■ 



G. Proof of Proposition [7| 

First, we write AP semi in terms of SINR of the individual data-streams by using P fcj . = 
aQ (/9(SINR fc ,) 1 / 2 ) and the expression for SINR fc . in ([8]). Then, we use the following bound 

for Q(x): 

ex P (-,V2) ( x 1 \ £ exp(-£/2) ^ 



Xy/2TT \ X 2 J xy/2n 
to establish the expression in (|41"I) . It is straightforward to check that 

SINRfc, per f, U nconst = A wf ( k) Afc (H*H) , (116) 

where the waterfilling power allocation {A wf (k)} is as in (f8~5T) (see App. Q and normalized to 



M 



^A wf (£;)=p. (117) 



k=l 



Similarly, we have 



SINR fc , stat , semi = T -i Tr -l = -1, (118) 

[G \ [adj(G)] fc 

G = I M + ^F s ^ mi H^HF semi = I M + -^.A t 1/2 H^ d A r H iid A t 1/2 . (119) 

M M p c 
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The matrix adj(G) refers to the adjoint of G, and [G 1 ] k and [adj(G)] fe refer to the A;-th diagonal 
entries of G 1 and adj(G), respectively. Using the definition of adjoint of a matrix, we have 

[adj(G)] fc = det(l M _ 1 + ^-A, 1/2 Hf d A r H iid A^, (120) 

where A t and Hjid are as per the notations established in Sec. HV-Al The expression for ASINR^ 
in the statement of the proposition follows immediately. ■ 



H. Proof of Theorem H] 

We have the following upper bound for SINR fei p er f iU nconst: 

, , \h ( A+ Hj^j A r Hud ) 
SINR feiPerf , unconst = A wf fe • M " d " d; (121) 

Pc 

< A wf (A;)A t (fc)- Amax(H ^ dArHiid) , (122) 

Pc 

where (a) follows from Lemma ITTl To compute SINR fc stat sem i, note that det(G), where G is as 
in (11191 ) can be written as 

M 



det(G) = J]^l + ^-A i (A t Hf d A r H iid )^ (123) 

( a ) M ( \ 
^ n( 1 + ^- A *^ Am -^ArH iid )J, (124) 



with (a) following from Lemma [TT] Similarly, we have 

A/-1 



M(G)] fe = n (^l + ] ^--A J (A t Hf d A r H iid )J (125) 

M , v 

^ II (l + T7--At(j)A max (HSf d A r H iid )J . (126) 

3=1, j¥* ^ pc ' 

Using Lemma [13] from App. |A] in (II 181) and (11221) . the following bounds hold with probability 
1 (in the limit of N r ,N u M) for SINR fc>perfiUnconst and SINR fc>statjSemi : 

SINR fciPerf , unconst < A^(k)At(k)- + I ' ^ 

y 7 r V A r j 

l + SINR feiStat , semi > 3 \ \ V r— — — — \ ~\~ (128) 
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for some universal constant C\ obtained from Lemma [131 If p is such that p > a A ™ M ^ , we can 
trivially lower bound SINR fe;Stat semi as 



l + SINR fciStat , semi > [ l + jL At (k) ( 



(a) 

> 




I _i_ I _ Qx I M- 

a 7r V ^r- 



M-l 



1 , 1 , Ci M-l 

1 + « + 7r V 



(129) 



1 + (M-1) J- 



1 _ Ci /_M 

7r V JVr 



l + 2(M-l)(i + f 



(130) 



where (a) follows from the fact that 1 + ax < (1 + x) a < 1 + 2ax for x sufficiently small and 
a > 0. After some routine manipulations, ASINR^ can be bounded as 



, . ^ i 1 3Ci M 

ASINR fe < M - + 



a 7r 



f) + A, W M)^)( 1 + f^) 



.1/ 



M 



■f-A^*) - 1 + 



a 



7rV /iv; 



: (2ViV t + VM) + 



Ci 



J IrVK 



(3MVM + V^"t 



a 



pA f (fc) 
M 



) + A f (*) • (A^ (k) - ^ 



pA f (fc) 

7r 



■ o 



M + ^N t 



.(131) 



We now use the facts that y/1 + x < 1 + f for any x positive, and ^z - i s upper bounded by 
1 + 2x as long as x < \ for the terms J 1+ ^ |M ^ SINRfc - and 1 \ respectively. The 



term exp 



/3 2 ASINR fc 



/3 2siNR fc, perf, unconst 

is bounded by using the fact that e x can be bounded by 1 + ax for some 



a > 1 in the small x regime. The combination of the above facts yields 



AP • < 

<—*- L semi 



M/3 2 



:E 



H 



' Af 

E 



1 



n 2 M ft 1 „ 



' M 
.fc=l 



P/3 2 Ef=iA t (A;) /l , 1 



+ 



M 



+ — O 

a 7r 



^ + VM 



(132) 



up to a constant scaling multiplicative constant on the right side. For the first term, we lower 
bound A wf (fc) from (l85l) by 

1 



A wf (fc) > 



P 



1 («) p 



i H Ah(*) M At(A;) ^ 
where (a) follows from Lemma [T3J For the third term, we have 



Qs, Mi 

7r V Nr 



(133) 



E 



H 



' M 

fc=i 



A? 



< M + p^ A t (£;) • £ H 



fc=i 





" 1 " 








--) 




M J 



(134) 
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Finally, we have 



E\ 



H 



1 1 



< 1 



1 

M 



E 



Pr(n H < M) < 



A H (M) 



E 



2 ■ 



(135) 



_A H (M)_ 

where the second inequality follows from the bound in (I901 ). Combining these facts, we have 



AP ■ < 

<—*- L semi _ 



M 

J_ V - 

$2^ pK t {k) 

M k=l M 



1 a 



/ 



+ 



M 



1 1 

- + — 

a cr 



i 



A H (M) 



\ 



E 



i 



A H (M) 



+ — O 



>Nt + VM 



(136) 



Thus the proof is complete. 
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